Synthesis of missing units in a Telugu text-to-speech system
نویسنده
چکیده
Synthesis of complex consonant clusters is a challenge in TTS systems, as it is difficult to ensure proper coarticulation. This problem surfaces in the synthesis of missing units in unit concatenation systems. Back-off techniques are used to synthesize these missing units. The current back-off techniques, which use smaller sub-word units to synthesize the missing unit, are found unsuitable for frequent usage causing severe degradation in synthesis quality. However missing units are a frequent occurrence when synthesizing unrestricted text. Hence to build open domain TTS systems, efficient back-off techniques are necessary. This problem is further aggravated when working with under-resourced languages as they lack the manually annotated speech corpora which enable the development of these high quality synthesis techniques. In this thesis we propose a two pronged strategy to improve the quality of back-off techniques. The two pronged strategy consists of refining the boundaries of units in the acoustic inventory to reduce phone insertion errors during back-off and use of a novel back-off technique to ensure coarticulation consistency. A boundary refinement algorithm which removes necessity for manually labelled data by relying on a knowledge base of cost functions derived from landmark specific acoustic cues was proposed, to increase the quality of unit boundaries. However increase in accuracy of unit boundaries was insufficient when context-dependent units are not available in the database. Hence a back-off technique which synthesizes complex consonant clusters even when the required context-dependent units are not available, was proposed. This technique is motivated from a phenomenon observed during second language acquisition and emulates native speaker intuition in synthesis of these clusters. The technique was proven to be better than the conventional backoff techniques. Further it was proven to be robust to segmentation errors, reducing the reliance on high quality segmentation techniques which require manually labelled data. Thesis Supervisor: Dr. Kishore S. Prahallad Title: Assistant Professor
منابع مشابه
Speech Synthesis System for Telugu Language
A system which takes input as a sequence of words and converts them to speech. Vowels and consonants are most important in Telugu language. The voices are sampled from real recorded speech. The speech synthesis is handheld by computers and mobile phones. To build a natural sounding speech synthesis system, it is essential that text processing component produce an appropriate sequence of phonemi...
متن کاملA text-to-speech synthesis system for telugu
In this paper, a diphone based Text-to-Speech (TTS) system for the Telugu language is presented. Telugu is one of the main south-Indian languages spoken by more than 100 million people. Speech output is generated using the Festival Speech Synthesis System and the MBROLA synthesis engine. The design and collection of diphones and voice building process are described. Our text analysis module, th...
متن کاملTelugu Text to Speech System for Mobile based Systems
Speech is the important mode of communication and is the current research topic. The concentration is mostly focused on synthesis and analyzing part.Apart of synthesizing, text to speech system is developed.Speech synthesis is an artificial production of human speech.A text to speech system (TTS) is to convert an arbitrary text into speech.In India different languages have been spoken each bein...
متن کاملExperiments with Unit Selection Speech Databases for Indian Languages
This paper presents a brief overview of unit selection speech synthesis and discuss the issues relevant to the development of voices for Indian languages. We discuss a few perceptual experiments conducted on Hindi and Telugu voices. 1 Role of Language Technologies Most of the Information in digital world is accessible to a few who can read or understand a particular language. Language technolog...
متن کاملIs word-to-phone mapping better than phone-phone mapping for handling English words?
In this paper, we relook at the problem of pronunciation of English words using native phone set. Specifically, we investigate methods of pronouncing English words using Telugu phoneset in the context of Telugu Text-to-Speech. We compare phone-phone substitution and wordphone mapping for pronunciation of English words using Telugu phones. We are not considering other than native language phones...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011